Merge with upstream #271

dhil · 2024-12-20T15:49:37Z

No description provided.

…ance#9794) This commit first fixed an issue with table access codegen to disable spectre mitigations on Pulley targets like how spectre is disabled for memory accesses as well. This unblocked many tests related to tables which then led to a different error about a `trapnz` with an 8-bit value not being supported. In fixing `trapnz` with 8-bit values this PR went ahead and did a general-purpose refactoring for how conditional branches are managed. Previously conditional traps and conditional branches had some duplicated logic and the goal was to unify everything. There is now a single `Cond` which represents the condition of a conditional jump which is used uniformly for all locations such as `select`, `brif`, and `trap[n]z`. This new type represents all the sorts of conditional branches that can be done in Pulley, for example integer comparisons and whether or not a register is zero. This `Cond` type has various helpers for printing it, inverting it, collecting operands, emission, etc. The end result is that it's a bit wordy to work with `Cond` right now due to the size of the variants but all locations working with conditional traps are deduplicated and now it's just repetitive logic rather than duplicated logic. Putting all of this together gets a large batch of spec tests working. I'll note that this does remove a feature where `trapnz` was turned into nothing or an unconditional trap if the argument was a constant, but that feels like an optimization perhaps best left for the middle-end rather than doing it in the backend. cc bytecodealliance#9783

Implement float-to-int bitcasts as well as the xor operation for integers. cc bytecodealliance#9783

) * cranelift: Round inline stack probes down, not up When we have `enable_probestack` turned on and set `probestack_strategy` to "inline", we have to compute how many pages of the stack we'll probe. The current implementation rounds our stack frame size up to the nearest multiple of the page size, then probes each page once. However, if our stack frame is not a multiple of the page size, that means there's a partial page at the end. It's not necessary to probe that partial page, just like it's unnecessary to probe at all if the frame is smaller than one page. Either way, any signal handler needs to be prepared for stack accesses on that last page to fault at any time during the function's execution. * Add comments explaining why we round down. * Port the round-down code to s390x too. * Update s390x expected outputs. --------- Co-authored-by: Dan Gohman <dev@sunfishcode.online>

* Use the C API as the example instead of the `wasmtime` CLI now that Cranelift can be disabled. * Update custom platform docs to talk about `signals-based-traps`.

* pulley: Get `strings.wast` test passing Needed the `imul` CLIF instruction to get implemented so 32/64-bit multiplication have now been added. cc bytecodealliance#9783 * Flag test as now passing

…ance#9805)

…ance#9803) * pulley: Fill out remaining 32/64-bit integer operations This required some extra plumbing to shepherd the precise reason why signed division trapped to Wasmtime which is done through an extra `TrapKind` side channel now added. This then additionally fixes the signed remainder interpreter function to return 0 on `MIN % -1` which is different from what Rust specifies (which is to return `None` or panic). cc bytecodealliance#9783 * Reduce some code duplication * Fix pulley tests * Fix MSRV compat

* pulley: Implement float<->int conversions Gets the `conversions.wast` test running along with a few other misc ones. cc bytecodealliance#9783 * Fix pulley's no_std build * One more conversion to a workspace dep

* Update to WASI 0.2.3 WIT files No major changes here; this just updates WIT files from WASI 0.2.2 to 0.2.3. * Update the vendor-wit.sh script to 0.2.3.

* pulley: Implement fcopysign for issue4890.wast * Fix #[no_std] issue + feedback

* pulley: Move `fp`/`lr` out of `XReg` set This commit moves the `fp` and `lr` registers out of the `XReg` register set and into the `MachineState` per-VM. These are automatically modified and read with `push_frame` and `pop_frame`. Dedicated `xmov_{fp,lr}` instructions were added for use in Wasmtime's trampolines which directly read these registers. * Fix pulley tests * Free up `spilltmp1` register Also unused in CLIF

* pulley: Implement float arithmetic operations Fill out enough to get `f32.wast` and `f64.wast` spec tests working. A minor ABI issue was discovered along the way which is also required to get a new test working on both 32 and 64-bit platforms. cc bytecodealliance#9783 * Centralize handling of float operations Add a new `wasmtime-math` crate which is tasked with providing the float math functions needed by Wasm with with, in theory, most optimal platform implementation available. * Fix riscv64 tests

* pulley: Get `call_indirect.wast` spec test working Fix a few typos here and there related to floats to get some tests passing. cc bytecodealliance#9783 * Update CLIF test expectations

This hasn't actually turned up anything in quite some time and MIRI testing is relatively slow so by default don't test MIRI on Pulley PRs (it's of course still tested on the merge queue though)

) Fill out some more misc float ops. cc bytecodealliance#9783

Fill out a lowering for CLIF's `ineg` instruction. cc bytecodealliance#9783

Gets some `embenchen_*.wast` tests passing. cc bytecodealliance#9783

Gets another `*.wast` test passing cc bytecodealliance#9783

…9818) Forcibly disable the `threads` proposal for Pulley. This is done because I don't believe there's any way that we can, in Rust, implement non-atomic loads and stores in a manner that isn't undefined behavior. Until this is available leave this proposal as flagged as unsupported.

* pulley: Initial scaffold of SIMD support This commit fills out some of the initial infrastructure necessary for supporting the SIMD proposal to WebAssembly in the Pulley interpreter, namely 128-bit simd. The `VRegVal` union has been filled out with various types, endianness questions are settled, and initial implementations of a suite of opcodes are added to get a basic set of tests working throughout the backend. cc bytecodealliance#9783 * Avoid dealing with big-endian vectors * Change wasm `global`s to store `v128` in little-endian format. * Change pulley stack loads/stores to work with vectors in little-endian format.

This commit adds the pulley targets to many of the preexisting `*.clif` runtests throughout the tree. This covers most of the MVP functionality of wasm for example and additionally exercises 8 and 16-bit lowerings for many instructions. A few minor pulley instructions were added and otherwise new 8/16-bit lowerings use existing instructions. It's expected that the 8/16-bit lowerings won't be used all that often so they're not particularly optimal at this time. Some CLIF tests were omitted such as: * Most SIMD-using CLIF tests * Float/int conversion tests using 8 and 16-bit integers * Tests with `call` instructions as relocations don't work with the JIT crate on Pulley * Tests using 128-bit integers Support for some of these tests may be enabled in the future, but for example 8/16-bit integers may not get used much.

…liance#9829) * Add an inheritance test * Only allow GCing tags we (mostly) know are safe to GC Make the conservatively correct choice the default one.

…liance#9824) It seems that this was removed in bytecodealliance#9381 The new method is really nice for controlling this at runtime, however the docs still mentioned the old behavior.

This is not directly reachable from wasm but can be created through optimizations.

* pulley: Implement `return_call` instructions This commit fleshes out the Cranelift lowerings of tail calls which gets the wasm tail call proposal itself working on Pulley. Most of the bits and pieces here were copied over from the riscv64 backend and then edited to suit Pulley. * Fix table64 addressing on 32-bit

…liance#9821) * allow customizing log prefixes for wasmtime serve command Signed-off-by: Joseph Zhang <jzhang@absolute.com> * pr feedback - use simple boolean flag instead --------- Signed-off-by: Joseph Zhang <jzhang@absolute.com>

* pulley: Implement SIMD `splat` instruction Gets a few spec tests and CLIF tests passing cc bytecodealliance#9783 * Fix typo

Fill out vector load-and-extend instructions.

Fill out some bit-related operations for vector registers.

Fill out some bitmask/test instructions for vectors.

* emit 32bit udiv * winch: aarch64 udiv/urem without extension * remove stray dbg! * fmt * remove println * fix formatting in ISLE * Sized TrapIf * move operand size into CondBrKind variant * show_reg_sized fallback

* pulley: Get `simd_conversions.wast` test working Lots of narrowing/extending/conversion-related opcodes implemented. Note that these opcodes are all in the "extended" namespace as the 1-byte namespace has started to overflow. * Fill out some TODO

…9843) Gets a number of `*.wast` tests passing.

Gets some misc wast tests passing.

* Move memories first in `VMContext` This commit shuffles the fields of `VMContext` to move memories first above all other items. This is done to ensure that the offset to memory-related fields is one of the smaller offsets and ideally fits within an 8-byte offset which helps fit into some special cases on various ISAs. For example x64 has a smaller instruction encoding for 8-bit offsets than 32-bit offsets, and I'm planning on doing a similar optimization for Pulley. * Review comments

* pulley: Add basic i128 comparison to backend Don't add special opcodes yet as this probably isn't performance critical in an interpreted context, but it should always be possible to add such opcodes in the future. * Review comments

…e#9845) * Optimize constant offset loads/stores in translation This commit implements an optimization when lowering Wasm bytecode to CLIF to skip a bounds check when the offset in memory is statically known. This comes up in C/C++/Rust when `static` memory is addressed for example where LLVM emits an `i32.const 0` as the base address and puts the address of the variable in the `offset` instruction field. This isn't necessary for 64-bit platforms but when explicit bounds-checks are required this can help to eliminate a constant-true bounds-check. * Review comments

…9852) No new pulley opcodes here, just reusing preexisting opcodes.

Pushes more into the 32-bit offset on load/store instructions.

More wast tests passing.

…#9854) * pulley: Shuffle opcodes to free some 1-byte opcodes In the near future the set of opcodes here are going to be expanded along a number of axes such as: * More modes of addressing loads/stores other than just `reg + offset32`. * Opcodes with immediate operands rather than unconditional register operands. The 1-byte opcode namespace is already filling up and there's a bit of a mishmash of what's 1-byte and what's "extended" for now. To help bridge this gap in the interim shuffle all float/vector-related opcodes into the "extended" opcode namespace. This frees up a large chunk of the 1-byte opcode namespace for future expansion with extensions like above. I'll note that I'm not 100% sure that the opcodes will all stay here after this reshuffling. I haven't done any profiling/performance analysis to gauge the impact of this change. The immediate goal is to start experimenting with non-float/vector programs and get them profiling well. This will require more "macro opcodes" such as new addressing modes and opcodes-with-immediates. These are expected to be relatively hot opcodes and thus probably want to be in the "fast" 1-byte namespace, hence the shuffling here. My plan is to in the future do a bit of an evaluation with a float/vector program and see whether it make sense to shuffle some of them into this 1-bytecode space as well. More radically it might make sense to remove the split between ops/extended ops and instead just have a 2-byte opcode space entirely. That's all left for future evaluations though. * Fix test offsets * Update test expectations

This commit extends the pulley opcode space with integer addition/subtraction where `src2` is an immediate. The goal here is to be a "sort of macro instruction" despite it not being too too macro here. This cuts down on `xconstN` instructions which both saves space in the final binary and should be slightly more optimal perf-wise due to only one dispatch being needed. In this commit the `xadd32` instruction is previously 3 bytes: one for an opcode and 2 bytes for the dst/src1/src2 binary operands. Adding a small constant to a register previously took 5 bytes where 2 bytes were needed for `xconst8 N` then 3 for the addition. Here the encoding size of the new instruction is 4 bytes: 1 for the opcode, 2 for dst/src1, and one for the immediate. This is currently chosen to mostly optimize dispatch in the interpreter loop as opposed to code size (as only a single byte is saved). In the future thought it would be possible to extend `BinaryOperands` to one operand being a 6-bit immediate to preserve the same code size. This also notably adds, for addition/subtraction, only unsigned immediates. With addition/subtraction being inverses of one another supporting signed immediates isn't necessary and helps free up another bit for packing numbers into these opcodes. This change reduces the size of `spidermonkey.cwasm` from 31M to 29M locally.

* add Default and Debug impls to SparseMap * more meaningful debug impl

This commit extends the set of opcodes to load/stores from memory with integer registers. Previously the only addressing mode supported was a base register plus a 32-bit signed immediate. This immediate frequently doesn't need 32-bits though and can often fit in a much smaller range. Looking at `spidermonkey.cwasm` a large number of loads/stores can fit within an unsigned 8-bit integer instead so this commit adds an `offset8` mode in addition to the preexisting `offset32` mode. Empirically this commit shrinks `spidermonkey.cwasm` for pulley64 from 33M to 31M. This notably, at this time, does not extend general addressing modes in Pulley nor does it extend all loads/stores. For example float/vector/big-endian loads and stores all continue to only support a 32-bit signed offset from the base pointer. This is done under the assumption that integer loads/stores dominate both performance/code-size, but this is not empirically proven just yet. Additionally at this time the choice is being made to add an opcode-per-addressing-mode rather than having a single load opcode take a general addressing mode. The assumption here is that decoding a fully general addressing mode and processing it is probably slower at runtime than specializing opcodes per addressing mode. This is currently an unproven assumption however and the cost of this is increased complexity in the Cranelift backend as it has to have many branches for all loads/stores supported.

…ce#9862) * pulley: Remove unwind metadata from Cranelift backend This is a copy/paste from the riscv64 backend originally but there's no need to integrate with native unwinders on Pulley so there's no need to track this information. This removes the `Unwind` pseudo-inst entirely. * Fix CI and review comments

* pulley: Add immediate payloads to more opcodes This commit adds immediate payloads to the following instructions: * `xmul32` - `xmul32_s8` / `xmul32_s32` * `xmul64` - `xmul64_s8` / `xmul64_s32` * `xband32` - `xband32_s8` / `xband32_s32` * `xband64` - `xband64_s8` / `xband64_s32` * `xbor32` - `xbor32_s8` / `xbor32_s32` * `xbor64` - `xbor64_s8` / `xbor64_s32` * `xbxor32` - `xbxor32_s8` / `xbxor32_s32` * `xbxor64` - `xbxor64_s8` / `xbxor64_s32` * `xshl32` - `xshl32_u6` * `xshl64` - `xshl64_u6` * `xshr32_u` - `xshl32_u_u6` * `xshr64_u` - `xshl64_u_u6` * `xshr32_s` - `xshl32_s_u6` * `xshr64_s` - `xshl64_s_u6` For shifts there's no need to have 32-bit immediates (or even 8-bit) since 6 bits is enough to encode all the immediates. This means that the 6-bit immediate is packed within `BinaryOperands` as a new `U6` type. This commit unfortunately does not shrink `spidermonkey.cwasm` significantly beyond the prior 29M. This is nevertheless expected to be relatively important for performance. * Fix test expectations

) This commit adds a large number of new `br_if_x*` instructions which compare with an immediate instead of comparing two registers. This is pretty common in wasm/compiled code where, for example, loop upper bounds are often constants. This helps compress code slightly while fusing more instructions together. The main cost of this is that the number of opcodes added here is quite large. Like with previous immediate-taking opcodes both 8 and 32-bit variants of immediates are added for all comparisons. Additionally unlike the previous set of branch-and-compare instructions it's required to add instructions for `>` and `>=` because the operands cannot be swapped to invert the condition, further increasing the number of opcodes added. This is a mild size reduction on `spidermonkey.cwasm` from 29M to 28M but it's mostly expected to be a performance win for interpreted loops.

…dealliance#9864) * pulley: Add macro instructions for function prologue/epilogue This commit adds two new instructions to Pulley to combine the operations of setting up a frame, allocating stack, and saving clobbered registers. This is all combined into a single instruction which is relatively large but is much smaller than each of these individual operations exploded out. This is a size win on `spidermonkey.cwasm` by about 1M and locally in a small `fib.wat` test this is also a good speedup by reducing the number of instructions executed. * Review comments and update test expectations

Filling out some more miscellaneous `*.wast` tests

* Test DWARF with --release This would have caught the issue in CI. It is also closer to what the end user will use. * [prtest:debug] Use volatile for keep-alive * Drop __attribute__((retain)) * Run tests in debug as well * Remove debug testing; add comment

* rename put_non_zero_in_reg to put_nonzero_in_reg_maybe_zext * handle 32bit sdiv * i32 rem_u/s * fix tests * fix rem * fmt * improve load_constant_full * fix tests, and review edits

This commit fixes a few mistakes that were introduced in bytecodealliance#9863. Specifically when lowering `Cond.If...` and the arguments needed swapping the condition was also inverted by accident. More `*.clif` runtests were added to catch this case and expose it. Additionally Pulley now has lowering for all the `FloatCC` orderings to be able to run the `select.clif` runtest which primarily exposed the issue.

* pulley: Implement some float simd ops Gets a few more wast tests passing * Enable some cranelift runtests

Refactor some existing usage of `pulley_xconst*` instructions to instead use instructions-taking-immediates instead now that they've been added to Pulley.

alexcrichton and others added 30 commits December 12, 2024 14:08

pulley: Get fused.wast passing (bytecodealliance#9797)

e130721

Implement float-to-int bitcasts as well as the xor operation for integers. cc bytecodealliance#9783

Update minimal build documentation (bytecodealliance#9802)

5b84c89

* Use the C API as the example instead of the `wasmtime` CLI now that Cranelift can be disabled. * Update custom platform docs to talk about `signals-based-traps`.

pulley: Get strings.wast test passing (bytecodealliance#9801)

0058cb7

* pulley: Get `strings.wast` test passing Needed the `imul` CLIF instruction to get implemented so 32/64-bit multiplication have now been added. cc bytecodealliance#9783 * Flag test as now passing

cli: add http outgoing body options (bytecodealliance#9800)

30596e6

Fix some --target mistakes from bytecodealliance#9802 (bytecodealli…

a6eb6f0

…ance#9805)

pulley: Implement float<->int conversions (bytecodealliance#9804)

3ec924f

* pulley: Implement float<->int conversions Gets the `conversions.wast` test running along with a few other misc ones. cc bytecodealliance#9783 * Fix pulley's no_std build * One more conversion to a workspace dep

Update to WASI 0.2.3 WIT files (bytecodealliance#9807)

74cba58

* Update to WASI 0.2.3 WIT files No major changes here; this just updates WIT files from WASI 0.2.2 to 0.2.3. * Update the vendor-wit.sh script to 0.2.3.

pulley: Implement fcopysign for issue4890.wast (bytecodealliance#9810)

f138aaa

* pulley: Implement fcopysign for issue4890.wast * Fix #[no_std] issue + feedback

pulley: Get call_indirect.wast spec test working (bytecodealliance#9809)

a30dce2

* pulley: Get `call_indirect.wast` spec test working Fix a few typos here and there related to floats to get some tests passing. cc bytecodealliance#9783 * Update CLIF test expectations

Exclude Pulley from MIRI testing on PRs (bytecodealliance#9816)

d68cb17

This hasn't actually turned up anything in quite some time and MIRI testing is relatively slow so by default don't test MIRI on Pulley PRs (it's of course still tested on the merge queue though)

pulley: Get f{32,64}_bitwise.wast tests working (bytecodealliance#9814

c18ca21

) Fill out some more misc float ops. cc bytecodealliance#9783

pulley: Get switch.wast spec test passing (bytecodealliance#9815)

cc0df2f

Fill out a lowering for CLIF's `ineg` instruction. cc bytecodealliance#9783

pulley: Fill out bnot lowering (bytecodealliance#9817)

b4a6d99

Gets some `embenchen_*.wast` tests passing. cc bytecodealliance#9783

pulley: Fill out lowerings for {s,u}{min,max} (bytecodealliance#9819)

da93f64

Gets another `*.wast` test passing cc bytecodealliance#9783

Pulley: implement support for bmask (bytecodealliance#9827)

54236e0

[DWARF] Make the GC of child tags conservative by default (bytecodeal…

e4abe55

…liance#9829) * Add an inheritance test * Only allow GCing tags we (mostly) know are safe to GC Make the conservatively correct choice the default one.

docs: remove old features flag from bindgen documentation (bytecodeal…

1956401

…liance#9824) It seems that this was removed in bytecodealliance#9381 The new method is really nice for controlling this at runtime, however the docs still mentioned the old behavior.

pulley: Lower umulhi and smulhi in CLIF (bytecodealliance#9830)

6ed283f

This is not directly reachable from wasm but can be created through optimizations.

pulley: Implement SIMD splat instruction (bytecodealliance#9832)

db4bd21

* pulley: Implement SIMD `splat` instruction Gets a few spec tests and CLIF tests passing cc bytecodealliance#9783 * Fix typo

pulley: Get simd_align.wast test passing (bytecodealliance#9838)

1f27065

Fill out vector load-and-extend instructions.

alexcrichton and others added 28 commits December 17, 2024 16:33

pulley: Get simd_bitwise.wast test passing (bytecodealliance#9839)

a5ae223

Fill out some bit-related operations for vector registers.

Update lang-ruby.md (bytecodealliance#9841)

de17231

pulley: Get simd_boolean.wast test passing (bytecodealliance#9840)

b2f160c

Fill out some bitmask/test instructions for vectors.

aarch64: support udiv for 32bit integers (bytecodealliance#9798)

031a28a

* emit 32bit udiv * winch: aarch64 udiv/urem without extension * remove stray dbg! * fmt * remove println * fix formatting in ISLE * Sized TrapIf * move operand size into CondBrKind variant * show_reg_sized fallback

pulley: Implement simd extractlane CLIF lowering (bytecodealliance#…

e2c22d5

…9843) Gets a number of `*.wast` tests passing.

pulley: Implement CLIF lowering of insertlane (bytecodealliance#9844)

0758679

Gets some misc wast tests passing.

pulley: Implement scalar_to_vector CLIF lowering (bytecodealliance#…

56db1b6

…9852) No new pulley opcodes here, just reusing preexisting opcodes.

pulley: Fold iadd a, $N into amode (bytecodealliance#9846)

8d77205

Pushes more into the 32-bit offset on load/store instructions.

pulley: Implement integer vector comparisons (bytecodealliance#9853)

11e238e

More wast tests passing.

Add Default and Debug impls to SparseMap (bytecodealliance#9860)

ab325dc

* add Default and Debug impls to SparseMap * more meaningful debug impl

pulley: Implement simd vector negation (bytecodealliance#9865)

c81b6a9

Filling out some more miscellaneous `*.wast` tests

cranelift: 32bit div_s, rem_u, rem_s for aarch64 (bytecodealliance#9850)

a179f95

* rename put_non_zero_in_reg to put_nonzero_in_reg_maybe_zext * handle 32bit sdiv * i32 rem_u/s * fix tests * fix rem * fmt * improve load_constant_full * fix tests, and review edits

pulley: Implement some float simd ops (bytecodealliance#9869)

99c5eb8

* pulley: Implement some float simd ops Gets a few more wast tests passing * Enable some cranelift runtests

pulley: Use immediate-taking instructions more (bytecodealliance#9871)

adcaed3

Refactor some existing usage of `pulley_xconst*` instructions to instead use instructions-taking-immediates instead now that they've been added to Pulley.

Merge with upstream

8213d9b

dhil merged commit bd991b2 into wasmfx:main Dec 20, 2024
43 checks passed

dhil deleted the wasmfx-merge branch December 20, 2024 21:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge with upstream #271

Merge with upstream #271

dhil commented Dec 20, 2024

Merge with upstream #271

Merge with upstream #271

Conversation

dhil commented Dec 20, 2024